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Abstract 

Asynchronous executions of a distributed algorithm differ from each other due to the nondeterminism 
in the order in which the messages exchanged are handled. In many situations of interest, the asyn- 
chronous executions induced by restricting nondeterminism are more efficient, in an application-specific 
sense, than the others. In this work, we define partially ordered executions of a distributed algorithm 
as the executions satisfying some restricted orders of their actions in two different frameworks, those 
of the so-called event- and pulse-driven computations. The aim of these restrictions is to characterize 
asynchronous executions that are likely to be more efficient for some important classes of applications. 
Also, an asynchronous algorithm that ensures the occurrence of partially ordered executions is given for 
each case. Two of the applications that we believe may benefit from the restricted nondeterminism are 
backtrack search, in the event-driven case, and iterative algorithms for systems of linear equations, in 
the pulse-driven case. 
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1 Introduction 



We consider a system represented by a connected undirected graph G = (N, E), where N = {1,2,- • • ,n} 
stands for a set of nodes and E for a set of point-to-point bidirectional communication channels. A channel 
involving two distinct nodes i and j is denoted by ij. For i £ N, we let N(i) = {j \ ij £ E} comprise the 
neighbors of i in G. In the system represented by G, a node is able to sequentially perform computations and 
to interact with neighbors solely by sending or receiving messages through the channels incident to it. Every 
node has its own local, independent clock, but has no access to a global clock of any kind. All channels are 
reliable, which means that every message is delivered to its destination with finite but unpredictable delay. 
This configuration characterizes the distributed and asynchronous nature of the system represented by G. 

A distributed computation carried out on G is fully described by an initial global state (comprising an 
initial state for each node and no messages in transit), the local computations performed by each node, and 
the interactions among nodes. The local computation of a node and the messages received from neighbors 
in G determine the evolution of its local state. Motivated by a number of applications, we consider two 
categories of distributed computations, depending upon what governs the local state transitions of the nodes. 
In the first category, that of event- driven computations, each node reacts whenever it receives a message by 
performing a local computation, as shown in Algorithm 1. More precisely, the receiving of a message from 
a neighbor affects node i's local state by means of the execution of procedure events, which encapsulates 
the actions of the particular computation associated with node i. Besides changing the local state of i, the 
execution of events produces as a result a set MSGi of messages, possibly empty Each message in MSGi, 
if any, has one of i's neighbors as destination. This framework is widely adopted in the description and 
analysis of asynchronous distributed algorithms for several applications [2] . 

Algorithm 1 Outline of the computation at node i in the event-driven framework. 
1: Set initial state of i 

2: EVENT j( - , MSGi) 

3: Send each message in MSGi to a specified neighbor 
4: while global termination is not known to i do 
5: if msgi has arrived from a neighbor of i then 
6: event, (msgi, MSGi) 

7: Send each message in MSGi to a specified neighbor 

The second category is that of pulse-driven computations, and is motivated by applications in which 
the initial state evolves towards a final state in phases, as in many iterative algorithms. To model such a 
behavior, we assume that a mechanism that generates a sequence of pulses is provided to govern evolution 
at each node i. Such a mechanism is given in Algorithm 2 by two functions, namely GETCuRRENTi and 
HAS Advanced,;. The former is used to notify the pulse generation mechanism that node i has started the 
local computations associated with the most recent pulse generated. On the other hand, the pulse generation 
mechanism signals the generation of a new pulse with the Boolean function hasAdvanced^. An additional 
assumption is that if hasAdvanceD; returns true, then it does not return true again before GEtCurrent,; 
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is invoked. The role of procedure events is only to incorporate any relevant information contained in msgi, 
whereas the local state transition is performed by procedure PULSE; (which also gets the current pulse as 
part of its input). 

Algorithm 2 Outline of the computation at node i in the pulse-driven framework, 
l: Set the initial state of i 

2: £i <— GETCURRENT; ( ) 
3: PULSE; (£ it MSGi) 

4: Send each message in MSGi to a specified neighbor 
5: while global termination is not known to i do 
6: if has Advanced j( ) then 

7: ti <— GETCURRENTj ( ) 

8: PULSEj(4, MSGi) 

9: Send each message in MSGi to a specified neighbor 

10: else if msgi has arrived from a neighbor of i then 

11: EVENT; (jnsgi ) 

Distributed computations described by the frameworks above are nondeterministic for two reasons, 
namely the asynchronous local computations on the various nodes and the nondeterministic delays incurred 
by message transfers through the network. These two factors affect the order in which messages arc processed 
by their destination nodes and, consequently, the evolution of the nodes' local states. In many situations 
of interest, computations starting at a fixed initial state evolve more rapidly towards a final state than oth- 
ers (the actual meaning of "more rapidly" may be faster or more efficient according to some performance 
criterion other than time). Thus, it is sometimes desirable to restrict the nondeterminism of a distributed 
computation, which can be accomplished by an appropriate control of the message ordering or the pulse 
generation mechanisms. For instance, synchronous distributed branch-and-bound may visit fewer subprob- 
lems than the asynchronous version [7, 13, 17]; in several cases of distributed iterative algorithms for solving 
systems of equations, convergence is not guaranteed for asynchronous executions, unless some restrictions on 
the order of the messages are respected [3]; other examples range from multimedia to agent systems [1, 9, 10]. 

Motivated by the fact that reducing the nondeterminism of distributed computations is useful in a wide 
variety of applications, restricted message ordering and synchronization mechanisms have been implemented 
in several systems [4, 5, 19]. These mechanisms implement two types of condition. The first type is associated 
with the order in which messages arc delivered to their destinations [14]. In general terms, such a condition 
may be stated as follows: 

Message Delivery Condition: No message msg is to be accepted by node i G N until all local actions 
that are required to occur before the reception of msg have taken place. Reception of msg by i may 
then have to be postponed. 

The canonical example here is the FIFO ordering, which requires that every message be delivered to its 
destination only after all other messages sent before it by the same sender through the same channel. The 



3 



second type of condition is specific to pulse-driven computations and targets the control of nondeterminism 
via the pulse generation mechanism: 

Pulse Generation Condition: No pulse is to be generated at node i E N until all local actions that are 
required to occur up to the current pulse have taken place. Generation of a new pulse at i may then 
have to be postponed. 

Particular cases of this condition are present in many models of synchronous distributed computation or in 
algorithms for simulating such models [2, 18]. 

In this paper, we deal with partially ordered computations constituting the subset of all possible asyn- 
chronous distributed computations that comprise only those satisfying some restricted order of their actions. 
In the event-driven framework, these restricting orders affect the order in which messages are delivered to 
their destinations (at line 5 of Algorithm 1). In this context, we give a new message delivery condition that 
generalizes the one leading to the FIFO ordering in the sense that the set of asynchronous computations 
satisfying the new condition may be larger than the original one. The computations that violate the FIFO 
ordering but are admitted by the new condition depend on parameters dynamically adjustable during these 
computations. In the same sense, we also generalize the condition associated with the causal ordering studied 
in [4, 12]. The principle of defining an ordering that can be dynamically tuned to become more or less strict 
as the computation evolves can also be applied to the pulse-driven framework. In this case, a generalization 
of an ordering can be obtained when the relation between the pulse generation mechanism and message 
delivery (lines 6 and 10 of Algorithm 2) is required to satisfy some constraints. We introduce an ordering 
that generalizes the well-known fully synchronous ordering [2], among others. 

Besides this introductory section, we present in Section 2 motivating applications for later discussion. 
Two general distributed problem solving methods are used for this purpose. The usefulness of a generalized 
message delivery condition, and of the ordering it induces, for the event-driven framework is illustrated 
by an asynchronous version of the synchronous distributed randomized backtrack search algorithm of [13] 
in which the asynchronism is controlled. For the pulse-driven framework, implementations of iterative 
algorithms for systems of linear equations arc also presented in that section. The generalized message 
delivery conditions we propose for the event-driven framework are then presented in Section 3, whereas the 
pulse-driven computations, with the new message delivery and pulse generation conditions, are the subject 
of Section 4. In both sections, the use of the new conditions in the motivating applications is also discussed. 
We close the paper with some concluding remarks and directions for further work in Section 5. 

2 Two motivating applications 

In this section, we describe applications in which the local state of each node i € N evolves according to 
the computations performed locally by i, which in turn are affected by the local states of other nodes. For 
each application, we give a brief description using the frameworks mentioned in Section 1, and we outline 
some partially ordered executions that have the effect of establishing dependencies between local states in 
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distinct nodes in terms of the message delivery and pulse generation conditions. Further details of their 
implementations are left to Sections 3 and 4. 

2.1 Backtrack search 

2.1.1 Search problems 

A search problem asks that certain object arrangements, called solutions, be found out of a large set of 
arrangements. Enumerating the extensions of a partially ordered set is a typical search problem, in which 
the arrangements are all the binary relations that can be defined on a set given as input and the solutions 
are only the relations that are partial orders containing the given partial order on the input set [8]. Search 
problems are amenable to a distributed treatment when the set of possible arrangements can be appropriately 
partitioned among the nodes of the distributed system. We take as our first example the randomized approach 
proposed and analyzed in [13] for such a partitioning in backtrack search algorithms. In the context of the 
algorithm in [13], henceforth referred to as the Karp-Zhang algorithm, a backtrack algorithm proceeds 
by successively applying a branching procedure to partition the set of possible arrangements. During this 
process, a subproblem corresponds to a subset of the set of possible arrangements. Each time the branching 
procedure is applied to subproblem s, it cither solves s directly (and does not produce any other subproblem) 
or derives from s a set of subproblems such that the solutions of s can be found from the solutions of the 
derived subproblems. An assumption that is tacitly made is that the branching procedure produces a tree, 
defined by the subproblems as nodes and the relation "subproblem derived from a branching of" as edges. 
We consider the case in which all leaves of the search tree must be generated and solved by the branching 
procedure. Thus, the recursion stops only when there are no subproblems left. 

2.1.2 An event-driven randomized algorithm 

The Karp-Zhang algorithm is randomized and may be described in the event-driven framework as follows. 
At any point of the execution, a frontier subproblem is a subproblem that has been generated but not yet 
passed on to the branching procedure. The intrinsic concurrency of the backtrack search stems from the fact 
that frontier subproblems can be distributed among the nodes, each subproblem to exactly one node. The 
frontier subproblems assigned to node i form the set Fi, also called the local frontier of i. The local frontier 
of a node evolves dynamically with subproblem branching; its size increases when new subproblems are 
created and decreases when a subproblem is solved. Since busy nodes (those with nonempty local frontiers) 
are able to branch frontier subproblems concurrently, idle nodes (those with empty local frontiers) request 
frontier subproblems from another node in order to become busy as fast as possible. This is the basic idea 
of Algorithm 3. 

For the purpose of handling the partitioning of the subproblems, there are three types of events. In a 
branching event, node i performs a branching on a frontier subproblem (lines 15-17); in a pairing event, 
node i uses a pairing message to request subproblems to some potential donating node (lines 19-21); and 
in a donation event, node i uses a donation message to donate half of its local frontier to a requesting node 
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(lines 7-9). In every event, node i sends a message to every neighbor in G. At least |iV(i)| — 2 out of 
these \N(i)\ messages are pairing messages without a donation request (line 23). There are two possibilities 
for the remaining two messages added to MSGf. either both are pairing messages, at most one requesting 
donation (line 21) and, consequently, at least one not requesting donation (line 23), or there is a donation 
message (line 9) and a pairing message (line 21 or 23). For simplicity of presentation, the actions related to 
termination detection are omitted. 

According to line 15 in Algorithm 3, the frontier subproblem considered in a branching event depends 
on an ordering of the subproblems from left to right in the search tree. In this ordering, a subproblem s is 
to the left of another subproblem s' if s is not in the path from the root to s' and is visited before s' in a 
depth-first traversal of the search tree. The set of pairs such that node i donates to node j is called the 
pawing set. For the pair (i,j), the donation event includes, for node i, line 7, where half of the subproblems 
in Tj arc chosen to be transfered to node j (T; C Ei contains the lowest-level subproblems of F i: the level of 
a subproblem being the distance from it to the root of the search tree) . Initially, Fi gets the initial problem 
for exactly one node i, whereas the others get 0. Taking the execution time as the number of branchings 
in the fully synchronous model [2], it is known in the worst case that the Karp-Zhang algorithm is, within 
constant factors and with high probability, as efficient as any deterministic depth-first algorithm that always 
chooses the pairing set as large as possible [13, 20]. 

2.1.3 Partially ordering the messages 

This adaptation of the Karp-Zhang algorithm to our (asynchronous) event-driven model respects the basic 
conditions for their probabilistic analysis, except for the synchronism and completely connected network as- 
sumptions. Relaxing these assumptions in Algorithm 3 has two reasons. First, it is more realistic, reducing 
the communication-related stress of the underlying physical system. Secondly, it avoids most of the syn- 
chronization overhead that the synchronous version incurs when the time taken by the branching procedure 
varies according to the subproblem. The price to pay is, potentially, a less efficient partitioning of the frontier 
subproblems among the nodes, due to an increase in the probability of unsuccessful donation requests. In 
order to minimize the effects of this drawback, a dynamic ordering of the messages could be imposed such 
that donating messages would not be overtaken by sequences of messages of any kind, in particular those 
ending with a pairing message with a donation request. We give more details in Section 3. 

2.2 Iterative methods 

2.2.1 Systems of linear equations 

Suppose we are given a sparse, invertible n x n matrix A and a size-n vector b of real numbers. Iterative 
algorithms that generate a sequence of approximations to the solution vector x of the system of linear 
equations Ax = b are very efficient methods to solve, in a distributed environment, the systems that arise 
in a number of engineering and science applications [15]. Generally speaking, the sequence starts with an 
initial guess x° for the solution vector x and a linear operator is iteratively applied to produce the successive 
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Algorithm 3 Events of an asynchronous version of the Karp-Zhang distributed backtrack search algorithm, 
l: procedure event j (msgi , MSGi) 



2: Let j be the origin of msgi 

3: MSGi «- 

4: JVi <- N(i) 

5: if msgi is a pairing message with a donation request then 
6: if|Fi|>2then 

7: Let Di C Tj be a set of [|Tj|/2] subproblems in T 4 

8: Fj <- F ? - \ A 

9: Add to MSGi a donation message containing D, and addressed to j 

10: iV, «- Ni \ {j} 

11: else if msgi is a donation message then 

12: Let -Di be the set of subproblems donated by j 

13: Fi <- Fi U A 

14: if Fj ^ then 

15: Let Si be the leftmost sub-problem in Fj 

16: Fj <- Fj \ {Sj} 

17: Fj <— Fj U {s I s is a subproblem produced by the branching of Si} 

18: if F, = then 

19: desti <— a random member of TVi 

20: iV, <- Ni \ {desti} 

21: Add to MSGi a pairing message requesting donation to desti 

22: for all k £ Ni do 

23: Add to M SGi a pairing message not requesting donation to k 
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approximations to x until satisfactory convergence is achieved. The linear operator used in each iteration 
commonly performs a matrix-vector multiplication involving the coefficient matrix A and the approximation 
vector produced in the previous iteration. 

An important issue when distributed implementations of such algorithms are considered is the mutual 
dependency of the entries of x in the corresponding linear operators, which dramatically affects the conver- 
gence speed. Assuming that each node is responsible for computing an entry of the solution vector x and 
also for the time being that G is completely connected, let Xi, for i £ N, be the entry of x assigned to 
node i. Since the coefficient matrix A is assumed to be sparse, the matrix-vector multiplication in the linear 
operator involving A establishes a dependency standard that is related to the nonzero entries of A. Let 
Gfc be a subgraph of G defined by the edges that represent the dependency standard for iteration k, which 
means that ij £ Ek if and only if the entry Xi[k] depends on the entry Xj[k] (and conversely); otherwise, 
cither Xj [k] depends on xj [k — 1] and Xj [k] on Xj [k — 1] or no dependency exists between Xj and Xj . The 
concurrency attainable in an iteration is determined by a coloring o/Gfc, which turns out to be a mapping c 
from N to an ordered set of colors such that neighbor nodes get different colors. Note that, for every i £ TV, 
the computation carried out in an iteration at every other node getting the same color as i is independent 

Of Xi. 

Two classical examples of iterative algorithms are the Jacobi and Gauss-Seidel methods. For these 
examples, G is such that its edges represent the nonzero entries of A, i.e., ij £ E if and only if the clement 
dij of A is nonzero. In the Jacobi case, for example, Xi[k] depends on the values produced by the neighbors 
of i at iteration k — 1 in such a way that, for all k > 0, 

x i [k] = ri ' N ^ [k ~ 1] +x i [k-l], (1) 
an 

where 

n,N(i) [k - 1] = h - ^2 aijXj[k-l]. (2) 

3GJV(i) 

An execution is said to have converged after iteration k if the magnitude of the residual r[k] = b — Ax[k] is 
smaller than a certain tolerance. The quadratic norm is commonly used to compute the magnitude of the 
residual, and the tolerance is usually a very small number e £ (0, 1). 

In many situations of practical interest, a coloring can be used to improve the convergence properties 
of (1), as follows. Let c be a coloring and define C(i) — N(i) R {j \ c(j) < c(i)} (the set of neighbors of i 
which are assigned colors smaller than c(i)) and C(i) = N(i) \ C(i). Based on this coloring, the iteration k 
of a general method becomes 

Xi[k\ = ^ \- Xi [k - I], (3) 

where r it c(i)[k] and r 4 c(i)[k ~ 1] are given analogously to (2). By the definition of C(i), it is clear that the 
nodes of G operate in the order of their colors during iteration k. Notice that (1) is obtained from (3) by 
simply taking E^ = for all k and a coloring that assigns the same color to all nodes (thence C(i) = N(i) 
for all i £ N). On the other hand, the Gauss-Seidel method is obtained by letting ij £ Ek if and only if 
dij + 0. 
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2.2.2 Pulse-driven algorithms 

The pulse-driven framework allows the description of the various dependency standards implicit in (3) solely 
in terms of the message delivery and pulse generation conditions. By effecting such a separation of compu- 
tation, communication, and synchronization concerns, one can implement several algorithms with a single 
implementation of the procedures event j and PULSE^, provided GEtCurrent.; and HAS Advanced j are con- 
sistent with (3) for the coloring at hand. Such flexibility leaves room for several dynamic implementations 
of (3). To be more precise, iteration (3) can be described by a distributed execution in which the actions 
performed by the nodes are determined by the procedures in Algorithm 4. In this algorithm, node i stores A, 
and bi, respectively the ith row of A and element of b. In addition, it starts out with k = 0, a;, = x®, r, = 0, 
and receivedi = \C(i)\. The number of pulses to compute an iteration is determined by the number of colors 
used in the coloring, ncolors, and the number nresidual of additional pulses required to compute the residual 
(this latter computation is left unspecified in Algorithm 4). The new approximations are computed during 
the first ncolors pulses, whereas the remaining nresidual pulses are devoted to concluding the computation 
of the residual. 

Algorithm 4 Iterative linear operator. 



1: procedure EVENTi(msgi) 

2: Increment receivedi by 1 

3: if msgi contains xj for some j € N(i) then 

4: r*i < r*i dijXj 

5: else 

6: Update residual 

7: procedure pulse,^, MSGi) 

8: if receivedi = |A(i)| + nresidual then 

9: if the magnitude of the residual is not in (0, e] then 

10: Xi «— n/au + x t 

11: Add to MSGi a message containing Xi and addressed to each j G N(i) 

12: Add to M SGi the messages related to the computation of the residual 

13: k <— k + 1 

14: receivedi *— 

15: T{ < bi aaXi 

16: else 

17: Continue updating the residual 



There are two types of message sent by node i in connection with iteration k and, for each of these two 
types, a maximum delay is established as a function of the coloring c. A message of the first type contains an 
approximation to Xi and is sent to all j £ N(i), as indicated in line 11. For each j, the delay of this message 
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is at most the number of pulses until j needs Xi[k] to update its own approximation, Xj. The maximum 
delay is then either c(J) — c(i), if i £ C(j), or ncolors + nresidual, otherwise. A message of the second type 
is the one used in the computation of the residual in lines 12 and 17. The delay of each of these messages 
should be set in such a way that the comparison in line 8 returns true within at most ncolors + nresidual 
pulses. 

3 Event-driven computations 

In this section, we introduce a slight variation of the formalism adopted in [2] as our framework of event- 
driven computations. Next, we discuss some message delivery conditions, each of them leading to a particular 
message ordering, that can be used to reduce the intrinsic nondetcrminism of such a framework. All references 
to procedure events, i € TV, correspond to the procedure described in Section 1 for the event-driven category 
of distributed computations. 

3.1 Model 

A set of functions events, for all i £ N, defines what we call an event-driven distributed algorithm F on 
G. Let 3 = Si U • • • U S n be the set of events of a particular computation of T, 3, being the set of events 
occurring at i £ N according to Algorithm 1. For £j £ 3j, let msgife) be the input message associated 
with and MSGi{^t) denote the set of messages generated by the occurrence of The sequence of local 
computations that determines the evolution of the local state of each node i £ N is represented by a total 
order on Sj or, alternatively, by the function timei, defined such that time^i) = tj if and only if £ 3j is 
the tith event that occurs at i (we also sometimes say that the time at which occurs at i is ij). Messages 
in MSGi{^i) trigger other events at a subset of neighbors of i. Event £j depends upon msgi(^i) and the local 
state resulting from the previous event at the same node. The only special case is that of time^i) = 1, in 
which case £j depends only on the initial local state. 

The causal dependencies of events in 3 are formally described by means of the usual "happened before" 
partial order defined on 3 as follows [16]. Let £ be an event. Write node{£) for the node at which £ occurs. 
We use £ — > £' to denote that either noete(£) = node(^') = i, for some i € N, with tinted) = time^') — 1, 
or node(£) ^ node(£') and tos<?(£') £ MSG(£). Given two events <S 3, they satisfy £ ~» £' if and only if 
there exists a sequence £ = £i, . . . , £j = £' of events in 3 such that £ s — » £ s +i, for every s £ {1, . . . ,t — 1}. 

The set 5 of events also induces the binary relation predj (and its analogue succi) that gives the prede- 
cessor (resp. successor) at node j £ N (resp. i £ N) of an event occurring at node i £ N (resp. j £ iV) [11]. 
In more formal terms, given £ 5 such that node(^j) = j ^ node(^i) = i, we say that = predj(^i) if 
and only if £j is the latest event in Sj such that £j £j. Note that predj is left undefined if £j £j 
holds for no event £j € Sj. In addition, timej(predj(^i)) is assumed to be in this case. Analogously, 
£i = succi(^j) if and only if £i is the earliest event in 3, such that £j ~> £j. We leave succi(^j) undefined 
and set timei{succi{^j)) = if £j £j holds for no event £ Sj. It should be noted that if £j — ► £j, 
then predj (^i) = £j and succi(£j) = However, this may not be the case in the more general situation of 
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& = sued (£ 



(a) predj(^) ^ £j. 




Figure 1: succ^j) = & 



(b) succi^j) ^ ii. 

predj(t;i) = £j is false. 



£j as illustrated in Figure 1. 

Every event £j € Sj, i £ N, defines the n-dimensional vector clock £j(tj), where tj = time^d) [11, 12]. 
For tj > 1, the ftth entry of Ei(ti), for h £ N, is given by 



tj, if /t, = i; 

timeh(predh(£i)) , otherwise. 



(4) 



The vector clock Ei(ti) gives, for every node h £ N, the time of the latest event at ft, such that £/j ~» £j (if 
h = i, the current time at i), unless no such event exists (and then .Ej (tj) = 0). For tj > 1, -Ej(tj) is obtained 
by taking £j(tj) = max{E^(ti — 1), Ej(tj)}, where tj is the time of the event £j at j £ N(i) that originates 
the message msgifa). A simple way to maintain vector clocks is then to attach Ej(tj) to every message 
in MSGj(£j) upon the occurrence of Under the general assumption of an arbitrary message delivery 
ordering, situations like the ones illustrated in Figure 1 make such size-n attachments strictly necessary [6]. 
However, it is possible to use smaller attachments when some specific message delivery ordering is assumed, 
as we discuss later. 

The vector clock £?j(tj) induces the global state Si(ti) with the following characteristics: 



• for each node j £ N, Sj (tj) is the local state resulting from the event that occurs at j at time E^ (t lh 
and 



• for each edge i'j 1 £ E, S\ ^ 3 (tj) is the set of messages in transit from node i' £ N to node j' £ N(i'), 
i.e., those sent by i' no later than E\ (tj) and received by j' later than Ej (tj); Sj (tj) is defined 
similarly. 

The notions of a vector clock and the global state it induces are depicted in Figure 2. In the global state 
Sj (time j(£j)), represented in the figure as a dashed line, the events £j, predi>(£i), and predji(^i) determine 
the local states of i, i', and j' , respectively. The message between i' and j' appears in the state of edge i'j' , 
in the i' — > j' direction. 

We henceforth let M(tj) be the size-|iV(i) vector whose jth entry M\ (tj) = \Sj~* l (ti)\ records the number 
of messages in transit on edge ij from j to i in global state Si(tj). 



11 



Si(timei(£i)) 




predi, (£ 



Figure 2: The global state induced by the vector clock Ei(timei(£i)) . 



3.2 FIFO ordering 

The FIFO message delivery condition is the following: if £j — * and — > are two distinct communications 
between nodes i G AT and j G Af(i), then ^ ~~> =>■ & -w (notice that the two distinct events £j and ^ 
at the same node i are such that £j ^ if and only if timei(£i) < iimej(^); the same holds for events £j 
and [12]. This is equivalent to saying that, if j G iV(i) is the origin of the message that triggers the fjth 
event at node i, then no message from j is in transit in Si(ti). Formally: 

FIFO Delivery Condition (FDC): For i G N, U > 1, and j G iV(i) the origin of the message that 
triggers the ijth event at i, M?(ti) = 0. 

The implementation of the FDC requires the implementation of the function M^(ti) for all ti > 1. For this 
purpose, a size-|Af(j)| vector r.i(ti) and a size-|A^(j)| vector Sj(tj) can be used to account for the number of 
messages exchanged between i and j, respectively, and their neighbors. In particular, entry Sj(timej(^j)) 
indicates the number of messages sent by j to i up to, and including, event £j. Of these, the number of 
messages already received by i up to, and including, event such that U = time^i) is given by rf(ti) [12]. 
Clearly, if £j is the trigger of then (timej — r{(ti) messages are in transit from j to i on edge ij in 
global state Si (i j ) . So (t, ) = (timej (£j )) — r| (t j) and, to allow the checking of whether Mj? (ij ) = , node 
j attaches s*- (timej (£j)) to the message sent to i as £j occurs. The use of the improvement for maintaining 
vector clocks we mentioned earlier, known as the Singhal-Kshcmkalyani improvement, is straightforward 
once the FDC is satisfied [21]. 

3.3 Causal ordering 

A stronger notion than FIFO ordering is that of causal ordering, which requires that no single message be 
overtaken by any sequence of messages [4, 12, 14]. Usually, causal ordering is given the following formal 
statement [4, 12]. Let £j — > £j be a communication between nodes j G N and i G N(j), and ~» where 
£j G 3j, £!• 7^ £■ 6 3j, and £■ ^ Then £j £J £j Situations like the one depicted in Figure 3, 

where the last message, say — > ^ , in the sequence of messages leading from £j to ^ is such that £j ~> 
are not allowed to happen (but notice that situations like those in Figure 1 may occur even if messages are 
causally ordered, and therefore succi(^j) = £j ■<=>■ predj(^i) = £j remains false). 
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Figure 3: A single message overtaken by a sequence of messages. The message £j — > £j is in transit in the 
global state Sj(i£me» 

The causal ordering is defined in terms of numbers of messages in transit in global states as follows: 
Causal Delivery Condition (CDC): For i G N, U > 1, and j G iV(i), M?(tf) = 0. 

The implementation of the CDC also requires the functions M\ (U) for all ij > 1. But, unlike the case of the 
FDC, Mf(ti) is needed whenever i receives a message, not just upon the arrival of a message from j. These 
functions can still be implemented based on ri(ij) and Sj(timej(predj(£i))) and the number of messages 
in transit from j to i on ij in global state Si(ij) is still given by Sj(timej(predj(£i))) — r^(ti). But now 
maintaining Mf (U) has to be approached similarly to maintaining a vector clock. The only difference in 
this case is that the attachment received by i along with a message from k G N(i) represents the view, at 
k, of the number of messages whose sending events by j up to, and including, event predj(^i) precede the 
Uth event at i causally. And since causal ordering implies the FDC, the simplified implementation of vector 
clocks mentioned in that case can be easily adapted. 

3.4 Relaxed FIFO ordering and the vector clock algorithm 

In many situations of interest, some degree of nondeterminism is tolerable. In the event-driven framework, a 
way to introduce some tolerance on message delivery ordering is to allow up to a certain number of messages 
to be in transit in the global states induced by vector clocks. When applied to FDC, this relaxation can be 
formulated as: 

Relaxed FIFO Delivery Condition (RFDC): For i G N, ti > 1, and j G N(i) the origin of the message 
that triggers the ^th event at i, M?(ti) < l^j(Ef (ti)). 

In this formulation, fi l j(Ej (£$)) is a nonnegativc integer function, determined by j, that indicates the max- 
imum number of messages which are accepted to be in transit in Si(ti) from j to i on ij. Clearly, FDC 
corresponds to RFDC with ^(E\ (ti)) = for all ti > 1. The computation of Mf(ti) can be conducted as 
before, and RFDC is implemented with the additional attachment of fj^(timej(predj(^i))) to the message 
sent to i by predj (£j ) . 

One natural question that arises when RFDC is considered is related to its consequences in the compu- 
tation of vector clocks. According to the main idea in the Singhal-Kshcmkalyani vector clock algorithm [21], 
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and taking into account that the message that triggers & is allowed to be overtaken by l± % j(E\ (ti)) messages 
sent previously by j, node j attaches to that message only the entries of Ej(timej(predj(£i))) that were 
modified since the earliest of the last //*-(£^(ij)) transmissions to i. For this purpose, node j keeps the 
additional size-|iV(j)| vector Uj(timej(£j)), whose entry Uj(timej(^j)) accounts for the number of messages 
sent to k G N(j) to which the value of E^(timej(^j)) was attached. Then node j only needs to attach those 
entries E^(timej(^j)) such that 

U^Ume 3 ^))<^(Ei(U)). (5) 

3.5 Relaxed causal ordering and an algorithm 

The causal ordering defined above may become too strict for a number of communications occurring during 
an execution, leading to unnecessary loss in concurrency. This fact motivates the definition of a relaxed 
causal ordering: 

Relaxed Causal Delivery Condition (RCDC): For ie N,t t >l, and j e N(i), M?(U) < ^(EKU)). 

Algorithms 5 and 6 are an implementation of RCDC based on the variables and ideas described earlier. For 
instance, the vector accounting for the messages sent by each node is maintained by an algorithm similar 
to that for maintaining the vector clock. The entry of that vector at node i corresponding to nodes j and 
k is given by s^(timej(predj(^i))) and denoted by s{^ k (ti). The implementation described in Algorithm 5 
corresponds to the message sending in lines 3 and 7 of Algorithm 1 for node j, whereas the reception of a 
message by node i is implemented in Algorithm 6. We assume that ti and tj are the times of the events that 
receive and send the messages, respectively. To each message sent by node j, it attaches the tolerance (J-j(tj) 
and the entries (tj) chosen so that (5) holds, and updates the corresponding Uj~^ e (tj), the entry of the 
vector Uj(tj) associated with nodes k and i. In Algorithms 5 and 6, SETTOLERANCEi and GETTOLERANCEi 
are application-specific and are used respectively to attach the tolerance to, or retrieve it from, a message. 
We give an example next. 

3.6 An application: distributed backtrack search 

Let us consider the distributed backtrack search application of Section 2.1. In this application, as indicated 
before, a dynamic ordering of the messages is useful to avoid the overtaking of donation messages by any 
message sequences. Recall that there are, besides donation messages, two other types of message in this 
application, namely pairing messages with and without a donation request. Two situations involving a 
pairing message with a donation request from i G N to j G N(i) and a sequence of messages starting at i 
and ending by a donation message to j are shown in Figure 4. In the case in which the donation message is 
overtaken, the donation request from i fails. 

The original idea is then to let the tolerance uKU) be determined by node i according to the type of 
message that is being sent to j G N(i). If this message is a pairing message with a donation request, then 
p,\ (ti) is set to a small value; otherwise, it is set to a very large value. The small value depends on the degree 
of causality that better suits the current stage of the search. A conservative implementation will choose zero 



14 



Algorithm 5 Implementation of the RCDC: message sending in lines 3 and 7 of Algorithm 1 for node j. 
Sending, by node j, of the messages in MSGj(£j): 

l: for all i G N(j) for which a message m exists in MSGj do 

2: SETTOLERANCEj (m, i) 

3: Hj(tj) <— GETTOLERANCEj (m) 

4: Initialize attachment a] with 

5: for all k G N, k ^ j do 

6: for all I e N(k) do 

7: if Ufr%) < fj*(tj) then 

8: Include s^ l {tj) in a* 

9: Increment Uj~* e (tj) by 1 

10: for all m G MSGj do 

11: Let i be the destination of m 

12: Increment s 3 ^ l {tj) by 1 and include it in a* 

13: Attach a* to m 

14: Send m 




(a) Donation is overtaken and donation requests (b) Donation is not overtaken and donation request 
fails. succeeds. 

Figure 4: Successful response to a donation request may depend on whether it is overtaken by a sequence of 
messages. 

in order to guarantee that donations are never overtaken by donation requests. Algorithm 7 summarizes 
this. 

4 Pulse-driven computations 

A pulse-driven distributed algorithm is defined by a set of functions pulse.; and EVENT;, for all i e JV. 
Once again, for given initial local states, there are multiple possible executions of a pulse-driven distributed 
algorithm. There are two sources for this nondeterminism: the delays between consecutive pulses at a single 
node and, as in the event-driven framework, those the messages undergo to be delivered. In this context, an 
appropriate ordering of pulses and messages reduces the nondeterminism of a pulse-driven algorithm. 
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Algorithm 6 Implementation of the RCDC: reception of a message by node i. 





Upon arrival of message msgi from node j at node i: 


1: 


Increment r] (L ) by 1 


2: 


lA <— GETTOLERANCE; (mSQi ) 


3: 


if Sj^' — r i(ti) Mji f° r some fc G -^V(i) such that ** is attached to msgi then 


4: 


Decrement rj(ti) by 1 


5: 


Postpone the delivery of msgi 


6: 


else 


7: 


for all k G N, £ G AT(fc) such that Sj~* £ is attached to msgi do 


a. 

o. 


11 ^ ^ LlltzJll 


9: 


<- s^ e 


10: 


for all j G N(i) do 


11: 


^r'(^) «- ° 


12: 


Deliver msgi 




Upon the occurrence of any event at node i : 


13: 


while an undelivered message msgi exists satisfying the delivery condition of line 3 do 


14: 


Let j be the origin of msgi 


15: 


Increment rj (t{ ) by 1 


16: 


Execute lines 7-12 



4.1 Model 

In addition to the set S of events, what characterizes a pulse-driven computation on G is a set A = AiU- • -UA„ 
of pulses. For each i G N, Aj stands for the pulses that occur at node i. For Aj G Aj, we let MSGi(Xi) 
denote the set of messages generated by the occurrence of Xi, represented in Algorithm 2 by the execution 
of PULSEj. The function ranki is used to order the pulses at node i G N and returns the rank of each pulse 
at i. The procedure PULSEj has ranki(Xi) as input and each of the resulting messages (those in MSGi(Xi)) 
triggers an event associated with some pulse at a neighbor of i. 

In this framework for describing and analyzing pulse-driven computations, the local state updates of any 
node arc driven by a local clock mechanism, which determines the execution of the pulses as described in 
Algorithm 2. Particularly important is how the message-triggered events in Sj are related to the pulses in 
A,-. We assume that every event £j G £j is associated with a pulse Xi G Aj via a function pulsei such that 
Xi = pulse.i(S,i). This means that £j occurs at i after i's local clock generates the (£i — l)th pulse, where 
£i = ranki(Xi), and before pulse Xi is generated. Conversely, with every pulse A^ G Aj is associated a set of 
events, possibly empty, consisting of the intermediate computations occurring at i between the generation 
of the (£i — l)th pulse and the £jth, where ii = ranki(Xi). 

The relation between events at distinct nodes, illustrated in Figure 5, is formalized by a re-definition of 
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Algorithm 7 Tolerance setting at node i for the distributed backtrack search. 



1: for all j g N(i) do 

2: ^ <- 



3: procedure setTolerance^to, j) 

4: if to is a pairing message with a donation request then 

5: ^ <- 

6: else 

7: /ij «- + 1 

8: Attach ^ to m 




Figure 5: Pulse-driven execution on a distributed system. 



the notation established for the precedence between events. For two distinct events and £| at a single 
node i £ A, £j — ► ^ indicates that is the latest event occurring at i such that ranki(pulsei(^i)) < 
ranki(pulse(^' i )) (note that events at a same pulse are unrelated). Now let and £j be two events at nodes 
i and j £ N(i), respectively. The notation £j — > £j is re-defined in the pulse-driven context to indicate that 
msg(^j) £ MSG(Xi), where A^ = pulse^i). This condition says that i's local state due to the occurrence 
of A, influences the local state of j due to pulse Aj = pulse j(£j). 

The precedence relation between the pulses of a pulse-driven computation is not determined by event- 
triggering messages but is, instead, defined on the number of pulses generated by each local clock mechanism. 
Naturally, if X{ and A' ; are two pulses at i, then we write \i — > X[ to denote ranki(Xi) = ranfcj(AQ — 1. We 
extend this notion almost directly to the case of pulses at different nodes: given A^ £ A, and Aj £ Aj such 
that rankj(Xj) > 1, j £ N(i), we write A.; — > Aj to mean that ranki(Xi) = rankj(Xj) — 1. 

The relation is defined between events or between pulses based on the appropriate — > like in the event- 
driven framework. We say that A.; = predi(Xj), for i,j £ N, if Ai is the latest pulse at i such that Ai ~-> Aj. 
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Note that ranki(predi(Xj)) = rankj(Xj) — 1 for j G N(i), provided rankj(Xj) > 1 (otherwise, predj(Xj) 
does not exist and we assume ranki(predi(Xj)) = 0). More generally, rankj(Xj) — ranki(predi(Xj)) is the 
distance between i and j in G so long as predi(Xj) exists (otherwise, ranki(predi(Xj)) = 0). The vector clock 
Pi(£i) of node i at pulse Xi is defined as the vector whose jth entry, for j G N, is given by 

Pi(£ t )={ ll > (6) 
I rankj(predj(Xi)), otherwise, 

where £i = ranki(Xi) . The definition of the global state Si(ti) associated with vector clock Pi(£i) is straight- 
forward, as follows: 

• for each node j G N, Sj (£i) is the local state resulting from the pulse at i whose rank is P?(£i); and 

• for each edge i'j' G E, S\ ^ J (£i) is the set of messages in transit from node i' <E N to node j' G N(i'), 
i.e., those sent by i at a pulse ranking no more than P? (£i) and received by j' through event £j such 
that pulsej^j) ranks more than Pf (ij); Sf ^ l (£i) is defined similarly. 

4.2 Synchronous ordering 

The strongest pulse and message ordering one may define is the synchronous ordering. This ordering requires 
that the delay a message from pulse Xi at node i G N undergoes to be delivered to i's neighbor j be bounded 
by the local times at j at which the (£i = ranfcj(A,))th and (£i + l)th pulses occur [2]. Put differently the 
synchronous ordering imposes an order on the events with respect to pulses. Let Xj be a pulse at j and let 
£j = rankj(Xj). Then messages are ordered such that £j — > £j only if A; — > Aj (i.e., — ^ = 1), where £j 
and £j are events at i and j, respectively, with pulseiij^i) = Xi and pulsej(^j) = Xj. What is meant here is 
that, by hypothesis, as many pulses as £i must have been performed so far at j when a message in MSGi(Xi) 
triggers an event at j. Under these conditions, the absence of messages between pulses A,; and Aj when 
Xi — > Aj provides information to node j: the local state of i resulting from the occurrence of Xi is irrelevant 
to the occurrence of Aj. 

A necessary condition for the synchronous ordering establishes the number of pulses that are required to 
occur before the reception of each message, as follows: 

Synchronous Delivery Condition (SDC): For i G N, j G N(i), and £j the reception at j of a message 
sent by pulse A^, rankj(pulsej(^j)) > ranki(Xi) + 1. 

In addition to the postponing of message deliveries the SDC may cause, the synchronous ordering also 
requires that some conditions be satisfied for pulse A^ to take place at i G N. This stems from the fact that 
the occurrence of Xi depends on the messages sent to i from the pulses at i's neighbors that rank less than 
Xi. Pulse Xi may only occur at i when the following condition holds: 

Synchronous Pulse Generation Condition (SPGC): For i G N and j G N(i), Sj~* 1 (ranki(Xi)) = 0. 

A natural question to ask is whether the SDC and the SPGC induce the synchronous ordering as claimed. 
To see that this is indeed the case, consider pulse Xi at node i G N and let msgj G MSGi(Xi) trigger event 
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(a) The SDC is violated. (b) The SPGC is violated. 

Figure 6: Two scenarios forbidden by the synchronous ordering. 



£j at j G N(i). By the SDC, rankj(Xj) > rank^Xj) + 1, with Aj = pulseA^j). In addition, by the SPGC, 
ms<7j must not be in transit in Sj(ranki(Xi) + 1), which leads to rankj(Xj) < ranki(Xi) + 1. Figure 6 is an 
illustration of the situations that are forbidden by the SDC and the SPGC, with £j = ranki(Xi) and ^ = 
rankj(Xj). Implementations of the synchronous ordering arc the celebrated a, (3, and 7 synchronizers [2, 12]. 

An additional observation in connection with the synchronous ordering is that it also leads to a causal 
ordering of the messages in the sense of the re-definition of the precedence relation between events. It is a 
simple matter to check that, if a sequence of messages overtakes a single message, then either the SDC or 
the SPGC is violated. If the SDC is not violated, then the last message of the sequence is received at least 
two pulses after the pulse that originates the first message of the sequence (since the sequence has length at 
least 2). This means that the single message is received at least three pulses later, which violates the SPGC. 

4.3 Partially synchronized ordering and an algorithm 

The relaxation of the synchronous ordering is called the partially synchronous ordering and is defined through 
relaxations of the SDC and the SPGC. The delay of a message is bounded from below and above by integer 
functions, which affects the reception of messages and the generation of pulses as follows: 

Partially Synchronous Delivery Condition (PSDC): For i G N, j G N(i), £j the reception at j of a 
message sent by pulse Aj, and p J i (rank i (Xi)) > 1, rankj(j>ulsej(^j)) > r<znfe,(Aj) + p J i (ranki(Xi)). 

Partially Synchronous Pulse Generation Condition (PSPGC): For i G N, j G N(i), and < 
8i{ranh{\i)) < rankifa), \ranhiM) ~ 5 j l (rank l {X l ))) n (ranhiM)) = 0- 

The lower and upper bounds for the delay of a message are given in the PSDC and the PSPGC, respectively, 
by p J i (ranki(Xi)) and d J i (rank i (Xi)). In the PSDC, a message from A 2 ; is to be received by j only after the 
occurrence of a number of pulses at j greater than ranki(Xi) by a value determined by i at pulse A^. In the 
PSPGC, node i determines, for Ai, a rank at j such that all messages sent by j to i before, and including, 
the pulse of this rank must have been received before the occurrence of A^ . 

Some preliminary observations with respect to the combined implementation of the PSDC and the PSPGC 
are as follows. Consider pulse Xi and let li = ranki(Xi) at node i e iV. Every message sent by Xi to some 
j G N(i) carries £i and p\{ti) attached to it. A message msgi from node j G N(i) is accepted only when the 
pulse rank £j attached to msgi is at most — p l ^{lj). Consequently, if msgi arrives at i when li < ij + Pj{£j), 
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then i must wait until pulse lj + p l j(lj) occurs before accepting msgi. For this reason, msgi can only be 
accepted by i if a pulse ranking li exists such that both li — &\ (li) < lj and li > lj + p l j(lj) or, cquivalcntly, 
such that p l j(lj) < li — lj < Sf (li). Otherwise, there is no pulse at i at which the reception of msgi satisfies 
both the PSDC and the PSPGC. Therefore, one difficulty of implementing a partially synchronous ordering 
is that the functions pj, determined by j, and 8f, determined by i, must be compatible in the sense that 
there is a pulse at i, for every message sent by j, at which it can be accepted. We henceforth assume that 
this is the case. 

In addition to message postponing, the implementation of the partially synchronous ordering involves the 
control of pulse occurrences. In this context, one additional difficulty related to the combined implementation 
of the PSDC and the PSPGC is that of handling the "absence of messages" between two pulses. The reason 
for this is that the occurrence of A 2 ; depends on the number of messages sent to i from certain specific pulses 
at i's neighbors. Pulse A, only occurs after every message sent by j £ N(i) to i in connection with the pulse 
of rank li — 5f (li) triggers an event at i. 

Some of the variables, messages, and computation associated with Algorithm 8 are related to the control 
of the execution. A variable li is used to implement i's local clock mechanism. Its initial value is to indicate 
that the first pulse at node i has not yet occurred. The subset of MSGi(Xi), where A; € A», constituted 
by the messages addressed to j £ N(i) is represented simply by MSG?. Two additional control messages 
are used: saje\(li) stores a Boolean value indicating whether the control message associated with pulse 
Xj at j £ N(i) such that lj = rankj(Xj) has been received by i\ pending? (lj) , whose initial value is 0, 
is the number of messages sent by Xj and not yet received by i. The control messages affect the ordering 
of the computation, which in turn may change the state of the control variables. This depends upon the 
minimum delay p^(rankj(Xj)) attached to every message sent by Xj by the application-specific procedure 
SEtMinimumDelaYj and retrieved by i via function GEtMinimumDelay, . The implementation of function 
GETCURRENTi of Section 1 is not shown in Algorithm 8, since it simply returns li. 

4.4 An application: systems of linear equations 

Let us return to the iterative algorithm for systems of linear equations described in Section 2.2. Recall that 
ncolors + nresidual pulses occur at i £ N at each iteration k of this algorithm, and that the computation of 
Xi[k\ is accomplished as soon as all the information it needs from its neighbors is available. The minimum 
and maximum delays to be used with the partially synchronized ordering of Algorithm 8 are set as shown in 
Algorithm 9 in order to ensure that Xi [k] is computed in one of the first ncolors pulses of iteration k without 
interfering with the computation of the residual. These delays are determined based on a coloring in which 
the colors are numbered from the set {0, 1, . . . , ncolors — 1} and arc such that the computations at an iteration 
are performed in increasing order of the colors in this set. Let dist(a,b) be the "circular distance" between 
two colors a and &, given by b — a, if b > a, or b + ncolors — a, otherwise. At iteration k, the maximum 
delay 8l(k(ncolors + nresidual) + c(i)), for j £ N(i), is determined so that the computation of Xi[k] is 
accomplished in a pulse no later than the one ranked k(ncolors + nresidual) + c(i). The maximum delay is 
then given by dist(c(j),c(i)) for this pulse, which ensures that Xj[k], if j £ C(i), and Xj[k — 1], if j £ C(i), 
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Algorithm 8 Implementation of the PSDC and the PSPGC: message sending in lines 4 and 9 of Algorithm 2 
for node j, reception of a message by node i, and the local clock mechanism of node i. 





Sending, by node j , of the messages in MSGj(Xj): 


1: 


for all i G N(j) do 


2: 


T*j_'1' j_1 *j_1/7 1 1 51 if t~1S~Yi 1 

initialize control message m with ij and |MoGj| 


Q . 
O. 


Send m 


4: 


for all message msgj £ M SGj do 


5: 


Attach £j to msgj 


6: 


setMinimumDelaYj (rnsgj , £j, i) 


7: 


Send ms^j 


8: 


function HASADVANCEDi 


9: 


while a control message m from j 6 A^(z) exists with attachments ij and \MSG^\ do 


10: 


safest j) < — true 


11: 


Increment pending? {ij) by jAfSG-^- 


12: 


if sa/ef - <$?'(£)) and pendingf^ - <$?(£)) = for all j e JV(t) then 


13: 


Increment ^ by 1 


14: 


return true 


15: 


return false 




Upon arrival of message msgi from node j at node i with attachment ly. 


16: 


p* <— getMinimumDelay^ (msgi) 


17: 


if l t > ij + p) then 


18: 


Decrement pendingf (ij) by 1 


19: 


Deliver msgi 


20: 


else 


21: 


Postpone the delivery of msgi 




Upon the occurrence of any pulse at node i: 


22: 


while an undelivered message msgi exists satisfying the delivery condition of line 17 do 


23: 


Execute lines 16-19 
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Algorithm 9 Delays at node i, for j S N(i) and the colors c{i) and c(j). 



1: if £i = k(ncolors + nresidual) + c(i) then 
2: if c(j) < c(i) then 

3: ^'(4)^diat(cOV(i)) 



4: else 

5: Sf (li) <— dist(c(j), c(i)) + nresidual 



6: 



else 



7: 



Increment 5^ (£i) by 1 
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8; 



9: 



procedure setMinimumDelay^to, £ i: j) 
if c(j) > c(i) then 
Attach 1 to m 



11 



else 



12 



Attach ncolors + nresidual — li mod (ncolors + nresidual) to to 



13: function getMinimumDelaYj (to) 

14: return the minimum delay attached to m 



are available to i when needed at iteration k. The minimum delay is set in procedure SEtMinimumDelaYi 
of Algorithm 9 in such a way that Xi[k] is received by j £ N(i) such that c(j) < c(i) in a pulse no earlier 
than the first pulse of iteration k + 1, which only occurs after the pulses devoted to the computation of the 
residual of iteration k. 

5 Concluding remarks 

Partially ordering the executions of a distributed algorithm is a mechanism to restrict its set of executions. 
In several cases, this set of restricted executions comprises more efficient executions than its complement. 
We have presented partially ordered executions of a distributed algorithm as the executions satisfying some 
restricted orders of their actions in two different frameworks, those of event- and pulse-driven computations. 
In the event-driven framework, we have given new conditions for message delivery that generalize the ones 
leading to FIFO ordering and to causal ordering. An important property of these generalized conditions 
is that they can be dynamically tuned to become more or less strict as the computation evolves. The 
same principle has been applied to the pulse-driven framework, in which case a constraint on the relation 
between the pulse generation mechanism and message delivery is established to generalize the well-known 
fully synchronous ordering. 

The algorithm which partially orders the executions in each case may introduce some overhead, which is 
large to the same extent that the order restrictions are strict (the extremal case is that of causal ordering, in 
the event-driven framework, or that of fully synchronous ordering, in the pulse-driven case). Efficient imple- 
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mentations are application-dependent and correspond to those implementations that lead to computations 
likely to be efficient with limited overhead. For instance, an efficient implementation of the partially ordered 
version of randomized distributed backtrack search will provide a satisfactory trade-off between the number 
of messages postponed and the number of unsuccessful donation requests. In the same vein, an implemen- 
tation of the distributed iterative algorithm for systems of linear equations will be efficient when the gain 
in the number of iterations surpasses the overhead of ordering the execution. We expect that systematic 
experimentation on real- world instances of both problems will yield crucial insight into the most appropriate 
choices. 

References 

[1] F. Adelstcin and M. Singhal. Real-time causal ordering in multimedia systems. Telecommunication 
Systems, 7:59-74, 1997. 

[2] V. C. Barbosa. An Introduction to Distributed Algorithms. The MIT Press, Cambridge, MA, 1996. 

[3] D. Bertsekas and J. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Prentice-Hall 
International, Englewood Cliffs, NJ, 1989. 

[4] K. P. Birman and T. A. Joseph. Reliable communication in the presence of failures. A CM Transactions 
on Computer Systems, 2(l):39-59, 1987. 

[5] K. P. Birman, A. Schiper, and P. Stephenson. Lightweight causal and atomic group multicast. ACM 
Transactions on Computer Systems, 9(3):272-314, 1991. 

[6] B. Charon-Bost. Concerning the size of logical clocks in distributed systems. Information Processing 
Letters, 39(6):11-16, 1991. 

[7] R. C. Correa and A. Ferreira. On the effectivenes of parallel branch and bound. Parallel Processing 
Letters, 5(3):375-386, 1995. 

[8] R. C. Correa and J. L. Szwarcfiter. On extensions, linear extensions, upsets and downscts of ordered 
sets. Discrete Mathematics, 295(l-3):13-30, 2005. 

[9] S. Dobrcv, P. Flocchini, G. Prcncipe, and N. Santoro. Searching for a black hole in arbitrary networks: 
optimal mobile agent protocols. Distributed Computing, 19(1): 1—18, 2006. 

[10] S. Dobrev, R. Kralovic, N. Santoro, and W. Shi. Black hole search in asynchronous ring using tokens. 
In Proceedings of the International Conference on Algorithms and Complexity (CIAC06), volume 3998 
of Lecture Notes in Computer Science, pages 139-150, 2006. 

[11] L. M. A. Drummond and V. C. Barbosa. On reducing the complexity of matrix clocks. Parallel 
Computing, 29(7):895-905, 2003. 



23 



[12] V. Garg. Concurrent and Distributed Computing in Java. Wiley Interscience, Hoboken, NJ, 2004. 

[13] R. M. Karp and Y. Zhang. Randomized parallel algorithms for backtrack search and branch-and-bound 
computation. Journal of the ACM, 40(3):765-789, 1993. 

[14] A. D. Kshcmkalyani and M. Singhal. Necessary and sufficient conditions on information for causal 
message ordering and their optimal implementation. Distributed Computing, 11 (2):91— 111 , 1998. 

[15] V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing: Design and 
Analysis of Algorithms. The Benjamin/Cummings Publishing Company, Inc., 1994. 

[16] L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the 
ACM, 21(7):558-565, 1978. 

[17] G. Li and B. W. Wah. Coping with anomalies in parallel branch-and-bound algorithms. IEEE Trans- 
actions on Computers, C-35(6):568-573, June 1986. 

[18] N. Lynch. Distributed Algorithms. Morgan Kauffman Publishers, 1996. 

[19] L. L. Peterson, N. C. Bucholz, and R. D. Schlichting. Preserving and using context information in 
interprocess communication. ACM Transactions on Computer Systems, 7(3):217-246, 1989. 

[20] A. Ranadc. A simpler analysis of the Karp-Zhang parallel branch-and-bound method. Technical Report 
UCB/CSD-90-586, EECS Department, University of California, Berkeley, 1990. 

[21] M. Singhal and A. D. Kshcmkalyani. An efficient implementation of vector clocks. Information Pro- 
cessing Letters, 43(l):47-52, 1992. 



24 



